Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring tutorial #440

Open
wants to merge 4 commits into
base: dev
Choose a base branch
from
Open

Conversation

lidiazuin
Copy link
Contributor

No description provided.

@neo-technology-commit-status-publisher
Copy link
Collaborator

This PR includes documentation updates
View the updated docs at https://neo4j-docs-getting-started-440.surge.sh

New pages:

Copy link
Member

@nmervaillie nmervaillie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very valuable addition to the docs.
Just a few comments


The answer for these issues is, therefore, to refactor the property `languages` into a node and connect it to the `Movie` nodes with a new relationship.

== Eliminating duplicated data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically we're not deleting data in this section
rename to "reshaping the data"?

* *In order to perform the query, all `Movie` nodes must be retrieved* -> As the graph scales, the performance of a similar query can be dimished by the way you modeled your data.
* *The name of the language is duplicated in many `Movie` nodes (in this case, all of them)* -> If many nodes share a same property value, it could be a sign that this property value could instead become a new entity, like a node or a relationship, for example.

The answer for these issues is, therefore, to refactor the property `languages` into a node and connect it to the `Movie` nodes with a new relationship.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about a visual to represent the data model before/after?

[source,cypher]
--
MATCH (m:Movie)
MATCH (l:Language)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • this does not work, there is a where clause missing to take only the language nodes matching the movie languages
  • this will work only in small graphs, mid-size graphs and above require sub-transactions
  • a unique constraint is missing on the language identifier
  • In practice we would do all of this in a single query for efficiency

CREATE CONSTRAINT unique_language FOR (n:Language) REQUIRE n.name IS UNIQUE

Note: untested query

:auto <1>
MATCH (m:Movie)
CALL (m) {
  UNWIND m.languages AS language
  MERGE (l:Language {name:language})
  MERGE (m)-[:IN_LANGUAGE]->(l)   
  SET m.languages = null
} IN TRANSACTIONS OF 10000 rows

<1> required in neo4j browser to run nested transactions

One way to improve your current model is to check for duplicate key values and see if you can turn them into another entity, like a node or a relationship.
In this case, both production companies are based in California, so the state could be turned into a node for `State` and be connected to the producer companies via a new relationship `LOCATED_AT`:

image::california.svg[The producer company nodes now have one less property for state and connect to a state node for California, role=popup]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For data consistency, the country property should also move to the State nodes

RETURN m
--

| How many users rated a movie?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more 'graphy' alternative could be

MATCH (m:Movie)
WHERE m.title = 'Apollo 13'
RETURN COUNT {(:User)-[:RATED]->(m)} AS `Number of reviewers`


This should be the result:

image::query-plan.png[Screenshot of Browser featuring a query plan that shows the number of database hits when you retrieve all person nodes,400,400,role=popup]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the data on the screenshot looks suspicious
I would not expect to have 0 rows in the execution pipeline, unless the DB is empty

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants